Search CORE

Full-Malaria/Parasites and Full-Arthropods: databases of full-length cDNAs of parasites and arthropods, update 2009

Author: Bentley
E. Kibukawa
H. Wakaguri
Holt
J. Watanabe
K. Hiranuka
M. Sasaki
S. Kawashima
S. Sugano
Suzuki
T. Katayama
Y. Suzuki
Publication venue: Oxford University Press
Publication date
Field of study

Full-Malaria/Parasites is a database for transcriptome studies of apicomplexa and other parasites, which is based on our original full-length cDNA sequences and physical cDNA clone resources. In this update, the database has been expanded to contain the shogun sequencing for the entire sequences of 14 818 non-redundant full-length cDNA clones from six apicomplexa parasites and 6.8 million of transcription start sites (TSS), both of which had been produced by novel protocols using the oligo-capping method and the Illumina GA sequencer. The former should be the ultimate data for exact annotation of the expressed genes, while the latter should be useful for ultra-deep expression analysis. Furthermore, we have launched Full-Arthropods, a full-length cDNA database for arthropods of medical importance. Full-Arthropods contains 50 343 one-pass sequences, 10 399 shotgun complete sequences and 22.4 million TSS tags in anopheles mosquitoes that transmit malaria, tsetse flies that transmit trypanosomiasis and dust mites that cause allergic dermatitis and bronchial asthma. By providing the largest integrated full-length cDNA data resources in the apicomplexa parasites as well as their vectors, Full-Malaria/Parasites and Full-Arthropods should help combat parasitic diseases. Full-Malaria/Parasites and Full-Arthropods are accessible from http://fullmal.hgc.jp/

Features of mammalian microRNA promoters emerge from polymerase II chromatin immunoprecipitation data

Author: A Bird
A Marson
A Rodriguez
A Sandelin
A Sandelin
AP Bird
Arindam Bhattacharjee
Ben Gordon
CD Schmid
Christopher K. Patil
D Karolchik
David L. Corcoran
DL Corcoran
DP Bartel
DS Prestridge
DS Prestridge
E Wingender
F Ozsolak
GD Stormo
GG Loots
GM Borchert
H Wakaguri
HJ Bussemaker
HK Saini
I Rigoutsos
IP Ioshikhes
J Taylor
J van Helden
K Woods
KD Taganov
Kusum V. Pandit
M Gardiner-Garden
M Megraw
MJ Buck
MP Brown
N Liu
Naftali Kaminski
NJ Martinez
O Chapelle
P Carninci
P Jin
Panayiotis V. Benos
R Gangal
R Shalgi
RM Kuhn
S Baskerville
S Fujita
S Mahony
S Mahony
SJ Cooper
T Abeel
T Thum
T Wang
TA Down
U Ohler
U Ohler
WJ Kent
X Zhao
X Zhou
Y Lee
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/04/2009
Field of study

Background: MicroRNAs (miRNAs) are short, non-coding RNA regulators of protein coding genes. miRNAs play a very important role in diverse biological processes and various diseases. Many algorithms are able to predict miRNA genes and their targets, but their transcription regulation is still under investigation. It is generally believed that intragenic miRNAs (located in introns or exons of protein coding genes) are co-transcribed with their host genes and most intergenic miRNAs transcribed from their own RNA polymerase II (Pol II) promoter. However, the length of the primary transcripts and promoter organization is currently unknown. Methodology: We performed Pol II chromatin immunoprecipitation (ChIP)-chip using a custom array surrounding regions of known miRNA genes. To identify the true core transcription start sites of the miRNA genes we developed a new tool (CPPP). We showed that miRNA genes can be transcribed from promoters located several kilobases away and that their promoters share the same general features as those of protein coding genes. Finally, we found evidence that as many as 26% of the intragenic miRNAs may be transcribed from their own unique promoters. Conclusion: miRNA promoters have similar features to those of protein coding genes, but miRNA transcript organization is more complex. © 2009 Corcoran et al

D-Scholarship@Pitt

Analysis of Gene Regulatory Networks in the Mammalian Circadian Rhythm

Author: AE Kel
AI Su
B Kornmann
BH Miller
BR Zeeberg
Chunxuan Shao
FO James
G Thijs
GZ Hertz
H Wakaguri
Haifang Wang
HR Ueda
HR Ueda
Jeffrey M. Gimble
Jun Yan
K Bozek
KD Pruitt
L Yin
M Rakhshandehroo
P Carninci
S Aerts
S Panda
S Rahmann
SM Reppert
V Porterfield
Y Suzuki
Yuting Liu
Publication venue: Public Library of Science
Publication date: 01/10/2008
Field of study

Circadian rhythm is fundamental in regulating a wide range of cellular, metabolic, physiological, and behavioral activities in mammals. Although a small number of key circadian genes have been identified through extensive molecular and genetic studies in the past, the existence of other key circadian genes and how they drive the genomewide circadian oscillation of gene expression in different tissues still remains unknown. Here we try to address these questions by integrating all available circadian microarray data in mammals. We identified 41 common circadian genes that showed circadian oscillation in a wide range of mouse tissues with a remarkable consistency of circadian phases across tissues. Comparisons across mouse, rat, rhesus macaque, and human showed that the circadian phases of known key circadian genes were delayed for 4–5 hours in rat compared to mouse and 8–12 hours in macaque and human compared to mouse. A systematic gene regulatory network for the mouse circadian rhythm was constructed after incorporating promoter analysis and transcription factor knockout or mutant microarray data. We observed the significant association of cis-regulatory elements: EBOX, DBOX, RRE, and HSE with the different phases of circadian oscillating genes. The analysis of the network structure revealed the paths through which light, food, and heat can entrain the circadian clock and identified that NR3C1 and FKBP/HSP90 complexes are central to the control of circadian genes through diverse environmental signals. Our study improves our understanding of the structure, design principle, and evolution of gene regulatory networks involved in the mammalian circadian rhythm

Translog, a web browser for studying the expression divergence of homologous genes

Author: A Mortazavi
A Sandelin
A Visel
AJ Vilella
Altuna Akalin
B Papp
Boris Lenhard
C Park
D Zheng
GA Wray
H Kikuta
H Suzuki
H Wakaguri
KD Makova
KJ Kolell
KP White
KS Kassahn
KS Pollard
M de Hoon
M Ha
M Ha
MC King
MJ West-Eberhard
MS Taylor
P Callaerts
P Flicek
P Khaitovich
PG Engstrom
RM Kuhn
S De
S Haider
S Schwartz
SB Carroll
SF Levy
TJ Hubbard
WJ Gehring
X Dong
X Gu
Xianjun Dong
Yogita Sharma
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Increasing amount of data from comparative genomics, and newly developed technologies producing accurate gene expression data facilitate the study of the expression divergence of homologous genes. Previous studies have individually highlighted factors that contribute to the expression divergence of duplicate genes, e.g. promoter changes, exon structure heterogeneity, asymmetric histone modifications and genomic neighborhood conservation. However, there is a lack of a tool to integrate multiple factors and visualize their variety among homologous genes in a straightforward way. Results We introduce Translog (a web-based tool for Transcriptome comparison of homologous genes) that assists in the comparison of homologous genes by displaying the loci in three different views: promoter view for studying the sharing/turnover of transcription initiations, exon structure for displaying the exon-intron structure changes, and genomic neighborhood to show the macro-synteny conservation in a larger scale. CAGE data for transcription initiation are mapped for each transcript and can be used to study transcription turnover and expression changes. Alignment anchors between homologous loci can be used to define the precise homologous transcripts. We demonstrate how these views can be used to visualize the changes of homologous genes during evolution, particularly after the 2R and 3R whole genome duplication. Conclusion We have developed a web-based tool for assisting in the transcriptome comparison of homologous genes, facilitating the study of expression divergence.</p

University of Bergen

Springer - Publisher Connector

NORA - Norwegian Open Research Archives

MDC Repository

MGEx-Udb: A Mammalian Uterus Database for Expression-Based Cataloguing of Genes across Conditions, Including Endometriosis and Cervical Cancer

Author: A Balasubramanian
A Brazma
A Jemal
A Subramanian
Akhilesh K. Bajpai
C Stark
D Warde-Farley
Darshan S. Chandrashekar
DR Rhodes
E Taylor
H Choi
H Parkinson
H Wakaguri
HW Chen
J Chen
J Hubble
K Ikeo
KK Acharya
Kshitish K. Acharya
M Ashburner
M Magrane
Mahalakshmi Dinakaran
MS Boguski
O Morozova
PT Spellman
R Klaes
RA Irizarry
S Wray
SA Ochsner
Selvarajan Ilakya
SJ Xiao
SM Agarwal
Sravanthi Davuluri
T Barrett
T Werner
TF Rayner
V Pihur
X Kong
X Liu
Zhanjiang Liu
Publication venue: Public Library of Science
Publication date: 11/05/2012
Field of study

Gene expression profiling of uterus tissue has been performed in various contexts, but a significant amount of the data remains underutilized as it is not covered by the existing general resources.). The database can be queried with gene names/IDs, sub-tissue locations, as well as various conditions such as the cervical cancer, endometrial cycles and disorders, and experimental treatments. Accordingly, the output would be a) transcribed and dormant genes listed for the queried condition/location, or b) expression profile of the gene of interest in various uterine conditions. The results also include the reliability score for the expression status of each gene. MGEx-Udb also provides information related to Gene Ontology annotations, protein-protein interactions, transcripts, promoters, and expression status by other sequencing techniques, and facilitates various other types of analysis of the individual genes or co-expressed gene clusters.In brief, MGEx-Udb enables easy cataloguing of co-expressed genes and also facilitates bio-marker discovery for various uterine conditions

The Long March: A Sample Preparation Technique that Enhances Contig Length and Coverage by High-Throughput Short-Read Sequencing

Author: A Janulaitis
AF Siegel
Armin Hekele
Charles Chiu
CJ Stoeckert Jr
CT Wai
Dale Webster
ER Mardis
F Mashayekhi
F Mathieu-Daude
F Sanger
H Okamoto
H Wakaguri
J Shendure
J. Graham Ruby
JO Korbel
Joseph L. DeRisi
Katherine Sorber
KE Wommack
M Chaisson
M Hafner
M Petrusyte
M Pop
M Ronaghi
Mark A. Batzer
Michelle Dimon
MJ Gardner
N Whiteford
O Salas-Solano
R Knight
RA Holt
RJ Roberts
RL Warren
SF Altschul
SF Altschul
SM Hadi
TS Seo
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

High-throughput short-read technologies have revolutionized DNA sequencing by drastically reducing the cost per base of sequencing information. Despite producing gigabases of sequence per run, these technologies still present obstacles in resequencing and de novo assembly applications due to biased or insufficient target sequence coverage. We present here a simple sample preparation method termed the “long march” that increases both contig lengths and target sequence coverage using high-throughput short-read technologies. By incorporating a Type IIS restriction enzyme recognition motif into the sequencing primer adapter, successive rounds of restriction enzyme cleavage and adapter ligation produce a set of nested sub-libraries from the initial amplicon library. Sequence reads from these sub-libraries are offset from each other with enough overlap to aid assembly and contig extension. We demonstrate the utility of the long march in resequencing of the Plasmodium falciparum transcriptome, where the number of genomic bases covered was increased by 39%, as well as in metagenomic analysis of a serum sample from a patient with hepatitis B virus (HBV)-related acute liver failure, where the number of HBV bases covered was increased by 42%. We also offer a theoretical optimization of the long march for de novo sequence assembly

CiteSeerX

eScholarship - University of California

High Sensitivity TSS Prediction: Estimates of Locations Where TSS Cannot Occur

Author: A Kanhere
C Wei
Chikatoshi Kai
GJ McLachlan
H Kawaji
H Wakaguri
I Korf
I Ovcharenko
JL Rinn
Jun Kawai
K Maruyama
L Ponger
MC Frith
MG Reese
N Cohen
P Carninci
P Carninci
P Carninci
P Carninci
P Kapranov
P Kapranov
P Ng
Piero Carninci
Rimantas Kodzius
RV Davuluri
S Hashimoto
S Knudsen
T Shiraki
TA Down
Timothy Ravasi
U Ohler
Ulf Schaefer
VB Bajic
VB Bajic
VB Bajic
VB Bajic
VB Bajic
VB Bajic
VB Bajic
VB Bajic
Vladimir B. Bajic
VV Solovyev
Y Sugahara
Yoshihide Hayashizaki
Publication venue: Public Library of Science
Publication date: 15/11/2010
Field of study

Although transcription in mammalian genomes can initiate from various genomic positions (e.g., 3′UTR, coding exons, etc.), most locations on genomes are not prone to transcription initiation. It is of practical and theoretical interest to be able to estimate such collections of non-TSS locations (NTLs). The identification of large portions of NTLs can contribute to better focusing the search for TSS locations and thus contribute to promoter and gene finding. It can help in the assessment of 5′ completeness of expressed sequences, contribute to more successful experimental designs, as well as more accurate gene annotation.Using comprehensive collections of Cap Analysis of Gene Expression (CAGE) and other transcript data from mouse and human genomes, we developed a methodology that allows us, by performing computational TSS prediction with very high sensitivity, to annotate, with a high accuracy in a strand specific manner, locations of mammalian genomes that are highly unlikely to harbor transcription start sites (TSSs). The properties of the immediate genomic neighborhood of 98,682 accurately determined mouse and 113,814 human TSSs are used to determine features that distinguish genomic transcription initiation locations from those that are not likely to initiate transcription. In our algorithm we utilize various constraining properties of features identified in the upstream and downstream regions around TSSs, as well as statistical analyses of these surrounding regions.

Identification of gene co-regulatory modules and associated cis-elements involved in degenerative heart disease

Author: A Subramanian
AI Su
Arkady M Pertsov
AS Barth
BJ Wilkins
C Danko
C Kioussi
Charles G Danko
DW Jeong
E Segal
F Tan
F Wittchen
G Dennis
H Rindt
H Wakaguri
J Hwang
J Tian
J Wang
JA Towbin
JD Barrans
JL Hall
KA Dellow
LA Megeney
M Flesch
M Gupta
MA Beer
MB Eisen
MM Kittleson
MS Parmacek
MS Parmacek
OV Kel-Margoulis
PK Bhavsar
R Bassel-Duby
R Development Core Team
R Edgar
R Gentleman
R Grzeskowiak
RCG Holland
S Malik
T Sugimoto
TH Christensen
TJP Hubbard
VR Iyer
WE Johnson
X Xie
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Cardiomyopathies, degenerative diseases of cardiac muscle, are among the leading causes of death in the developed world. Microarray studies of cardiomyopathies have identified up to several hundred genes that significantly alter their expression patterns as the disease progresses. However, the regulatory mechanisms driving these changes, in particular the networks of transcription factors involved, remain poorly understood. Our goals are (A) to identify modules of co-regulated genes that undergo similar changes in expression in various types of cardiomyopathies, and (B) to reveal the specific pattern of transcription factor binding sites, <it>cis</it>-elements, in the proximal promoter region of genes comprising such modules. Methods We analyzed 149 microarray samples from human hypertrophic and dilated cardiomyopathies of various etiologies. Hierarchical clustering and Gene Ontology annotations were applied to identify modules enriched in genes with highly correlated expression and a similar physiological function. To discover motifs that may underly changes in expression, we used the promoter regions for genes in three of the most interesting modules as input to motif discovery algorithms. The resulting motifs were used to construct a probabilistic model predictive of changes in expression across different cardiomyopathies. Results We found that three modules with the highest degree of functional enrichment contain genes involved in myocardial contraction (n = 9), energy generation (n = 20), or protein translation (n = 20). Using motif discovery tools revealed that genes in the contractile module were found to contain a TATA-box followed by a CACC-box, and are depleted in other GC-rich motifs; whereas genes in the translation module contain a pyrimidine-rich initiator, Elk-1, SP-1, and a novel motif with a GCGC core. Using a naïve Bayes classifier revealed that patterns of motifs are statistically predictive of expression patterns, with odds ratios of 2.7 (contractile), 1.9 (energy generation), and 5.5 (protein translation). Conclusion We identified patterns comprised of putative <it>cis</it>-regulatory motifs enriched in the upstream promoter sequence of genes that undergo similar changes in expression secondary to cardiomyopathies of various etiologies. Our analysis is a first step towards understanding transcription factor networks that are active in regulating gene expression during degenerative heart disease.</p

Springer - Publisher Connector

High-Resolution Characterization of Toxoplasma gondii Transcriptome with a Massive Parallel Sequencing Method†

Author: A. Ueno
Ajioka
Arkhipova
Balaji
Behnke
Blaustein
Bohne
Burg
Butler
C. Sugimoto
Callebaut
Chalkley
Cleary
Corden
Davuluri
Dawson
Deng
Dzierszinski
Dzierszinski
H. Wakaguri
Hertz
Hultmark
Iyer
J. Watanabe
J. Yamagishi
Javahery
Jeong
Jin
Kadonaga
Kawase
Kibe
Lim
M. Igarashi
M. Tolba
Manger
Manger
Matrajt
Matrajt
Mercier
Mullapudi
Nakaar
Parmley
Radke
S. Sugano
Seeber
Singh
Smale
Soldati
Struhl
Suzuki
X. Xuan
Xia
Y. Nishikawa
Y. Suzuki
Y.-K. Goo
Yamamoto
Yamamoto
Yook
Publication venue: Oxford University Press
Publication date
Field of study

For the last couple of years, a method that permits the collection of precise positional information of transcriptional start sites (TSSs) together with digital information of the gene-expression levels in a high-throughput manner was established. We applied this novel method, ‘tss-seq’, to elucidate the transcriptome of tachyzoites of the Toxoplasma gondii, which resulted in the identification of 124 000 TSSs, and they were clustered into 10 000 transcription regions (TRs) with a statistics-based analysis. The TRs and annotated ORFs were paired, resulting in the identification of 30% of the TRs and 40% of the ORFs without their counterparts, which predicted undiscovered genes and stage-specific transcriptions, respectively. The massive data for TSSs make it possible to execute the first systematic analysis of the T. gondii core promoter structure, and the information showed that T. gondii utilized an initiator-like motif for their transcription in the major and novel motif, the downstream thymidine cluster, which was similar to the Y patch observed in plants. This encyclopaedic analysis also suggested that the TATA box, and the other well-known core promoter elements were hardly utilized